activity class
Supplementary for Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning
Xiaoqian Wu Shanghai Jiao Tong University enlighten@sjtu.edu.cn In Tab. 1, we conclude the notations in this work for clarity.Notation Definition r A rule. The size of the premise symbols set M . S is the symbol set, and R is the rule set. A \ B The set difference of A and B. D A very large-scale activity images database.
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Li, Zechen, Chen, Baiyu, Xue, Hao, Salim, Flora D.
Motion sensor time-series are central to human activity recognition (HAR), with applications in health, sports, and smart devices. However, existing methods are trained for fixed activity sets and require costly retraining when new behaviours or sensor setups appear. Recent attempts to use large language models (LLMs) for HAR, typically by converting signals into text or images, suffer from limited accuracy and lack verifiable interpretability. We propose ZARA, the first agent-based framework for zero-shot, explainable HAR directly from raw motion time-series. ZARA integrates an automatically derived pair-wise feature knowledge base that captures discriminative statistics for every activity pair, a multi-sensor retrieval module that surfaces relevant evidence, and a hierarchical agent pipeline that guides the LLM to iteratively select features, draw on this evidence, and produce both activity predictions and natural-language explanations. ZARA enables flexible and interpretable HAR without any fine-tuning or task-specific classifiers. Extensive experiments on 8 HAR benchmarks show that ZARA achieves SOTA zero-shot performance, delivering clear reasoning while exceeding the strongest baselines by 2.53x in macro F1. Ablation studies further confirm the necessity of each module, marking ZARA as a promising step toward trustworthy, plug-and-play motion time-series analysis. Our codes are available at https://github.com/zechenli03/ZARA.
SEZ-HARN: Self-Explainable Zero-shot Human Activity Recognition Network
De Silva, Devin Y., Wickramanayake, Sandareka, Meedeniya, Dulani, Rasnayaka, Sanka
Human Activity Recognition (HAR), which uses data from Inertial Measurement Unit (IMU) sensors, has many practical applications in healthcare and assisted living environments. However, its use in real-world scenarios has been limited by the lack of comprehensive IMU-based HAR datasets that cover a wide range of activities and the lack of transparency in existing HAR models. Zero-shot HAR (ZS-HAR) overcomes the data limitations, but current models struggle to explain their decisions, making them less transparent. This paper introduces a novel IMU-based ZS-HAR model called the Self-Explainable Zero-shot Human Activity Recognition Network (SEZ-HARN). It can recognize activities not encountered during training and provide skeleton videos to explain its decision-making process. We evaluate the effectiveness of the proposed SEZ-HARN on four benchmark datasets PAMAP2, DaLiAc, HTD-MHAD and MHealth and compare its performance against three state-of-the-art black-box ZS-HAR models. The experiment results demonstrate that SEZ-HARN produces realistic and understandable explanations while achieving competitive Zero-shot recognition accuracy. SEZ-HARN achieves a Zero-shot prediction accuracy within 3\% of the best-performing black-box model on PAMAP2 while maintaining comparable performance on the other three datasets.
ADLGen: Synthesizing Symbolic, Event-Triggered Sensor Sequences for Human Activity Modeling
You, Weihang, Jiang, Hanqi, Liu, Zishuai, Xie, Zihang, Liu, Tianming, Lu, Jin, Dou, Fei
Real world collection of Activities of Daily Living data is challenging due to privacy concerns, costly deployment and labeling, and the inherent sparsity and imbalance of human behavior. We present ADLGen, a generative framework specifically designed to synthesize realistic, event triggered, and symbolic sensor sequences for ambient assistive environments. ADLGen integrates a decoder only Transformer with sign based symbolic temporal encoding, and a context and layout aware sampling mechanism to guide generation toward semantically rich and physically plausible sensor event sequences. To enhance semantic fidelity and correct structural inconsistencies, we further incorporate a large language model into an automatic generate evaluate refine loop, which verifies logical, behavioral, and temporal coherence and generates correction rules without manual intervention or environment specific tuning. Through comprehensive experiments with novel evaluation metrics, ADLGen is shown to outperform baseline generators in statistical fidelity, semantic richness, and downstream activity recognition, offering a scalable and privacy-preserving solution for ADL data synthesis.
Initial Findings on Sensor based Open Vocabulary Activity Recognition via Text Embedding Inversion
Ray, Lala Shakti Swarup, Zhou, Bo, Suh, Sungho, Lukowicz, Paul
Conventional human activity recognition (HAR) relies on classifiers trained to predict discrete activity classes, inherently limiting recognition to activities explicitly present in the training set. Such classifiers would invariably fail, putting zero likelihood, when encountering unseen activities. We propose Open Vocabulary HAR (OV-HAR), a framework that overcomes this limitation by first converting each activity into natural language and breaking it into a sequence of elementary motions. This descriptive text is then encoded into a fixed-size embedding. The model is trained to regress this embedding, which is subsequently decoded back into natural language using a pre-trained embedding inversion model. Unlike other works that rely on auto-regressive large language models (LLMs) at their core, OV-HAR achieves open vocabulary recognition without the computational overhead of such models. The generated text can be transformed into a single activity class using LLM prompt engineering. We have evaluated our approach on different modalities, including vision (pose), IMU, and pressure sensors, demonstrating robust generalization across unseen activities and modalities, offering a fundamentally different paradigm from contemporary classifiers.
Raising the Bar(ometer): Identifying a User's Stair and Lift Usage Through Wearable Sensor Data Analysis
Karande, Hrishikesh Balkrishna, Shivalingappa, Ravikiran Arasur Thippeswamy, Yaici, Abdelhafid Nassim, Haghbin, Iman, Bavadiya, Niravkumar, Burchard, Robin, Van Laerhoven, Kristof
Many users are confronted multiple times daily with the choice of whether to take the stairs or the elevator. Whereas taking the stairs could be beneficial for cardiovascular health and wellness, taking the elevator might be more convenient but it also consumes energy. By precisely tracking and boosting users' stairs and elevator usage through their wearable, users might gain health insights and motivation, encouraging a healthy lifestyle and lowering the risk of sedentary-related health problems. This research describes a new exploratory dataset, to examine the patterns and behaviors related to using stairs and lifts. We collected data from 20 participants while climbing and descending stairs and taking a lift in a variety of scenarios. The aim is to provide insights and demonstrate the practicality of using wearable sensor data for such a scenario. Our collected dataset was used to train and test a Random Forest machine learning model, and the results show that our method is highly accurate at classifying stair and lift operations with an accuracy of 87.61% and a multi-class weighted F1-score of 87.56% over 8-second time windows. Furthermore, we investigate the effect of various types of sensors and data attributes on the model's performance. Our findings show that combining inertial and pressure sensors yields a viable solution for real-time activity detection.
Generative Resident Separation and Multi-label Classification for Multi-person Activity Recognition
Chen, Xi, Cumin, Julien, Ramparany, Fano, Vaufreydaz, Dominique
This paper presents two models to address the problem of multi-person activity recognition using ambient sensors in a home. The first model, Seq2Res, uses a sequence generation approach to separate sensor events from different residents. The second model, BiGRU+Q2L, uses a Query2Label multi-label classifier to predict multiple activities simultaneously. Performances of these models are compared to a state-of-the-art model in different experimental scenarios, using a state-of-the-art dataset of two residents in a home instrumented with ambient sensors. These results lead to a discussion on the advantages and drawbacks of resident separation and multi-label classification for multi-person activity recognition.
Cross-user activity recognition using deep domain adaptation with temporal relation information
Ye, Xiaozhou, Abdulla, Waleed H., Nair, Nirmal, Wang, Kevin I-Kai
Human Activity Recognition (HAR) is a cornerstone of ubiquitous computing, with promising applications in diverse fields such as health monitoring and ambient assisted living. Despite significant advancements, sensor-based HAR methods often operate under the assumption that training and testing data have identical distributions. However, in many real-world scenarios, particularly in sensor-based HAR, this assumption is invalidated by out-of-distribution ($\displaystyle o.o.d.$) challenges, including differences from heterogeneous sensors, change over time, and individual behavioural variability. This paper centres on the latter, exploring the cross-user HAR problem where behavioural variability across individuals results in differing data distributions. To address this challenge, we introduce the Deep Temporal State Domain Adaptation (DTSDA) model, an innovative approach tailored for time series domain adaptation in cross-user HAR. Contrary to the common assumption of sample independence in existing domain adaptation approaches, DTSDA recognizes and harnesses the inherent temporal relations in the data. Therefore, we introduce 'Temporal State', a concept that defined the different sub-activities within an activity, consistent across different users. We ensure these sub-activities follow a logical time sequence through 'Temporal Consistency' property and propose the 'Pseudo Temporal State Labeling' method to identify the user-invariant temporal relations. Moreover, the design principle of DTSDA integrates adversarial learning for better domain adaptation. Comprehensive evaluations on three HAR datasets demonstrate DTSDA's superior performance in cross-user HAR applications by briding individual behavioral variability using temporal relations across sub-activities.
Deep Generative Domain Adaptation with Temporal Attention for Cross-User Activity Recognition
Ye, Xiaozhou, Wang, Kevin I-Kai
In Human Activity Recognition (HAR), a predominant assumption is that the data utilized for training and evaluation purposes are drawn from the same distribution. It is also assumed that all data samples are independent and identically distributed ($\displaystyle i.i.d.$). Contrarily, practical implementations often challenge this notion, manifesting data distribution discrepancies, especially in scenarios such as cross-user HAR. Domain adaptation is the promising approach to address these challenges inherent in cross-user HAR tasks. However, a clear gap in domain adaptation techniques is the neglect of the temporal relation embedded within time series data during the phase of aligning data distributions. Addressing this oversight, our research presents the Deep Generative Domain Adaptation with Temporal Attention (DGDATA) method. This novel method uniquely recognises and integrates temporal relations during the domain adaptation process. By synergizing the capabilities of generative models with the Temporal Relation Attention mechanism, our method improves the classification performance in cross-user HAR. A comprehensive evaluation has been conducted on three public sensor-based HAR datasets targeting different scenarios and applications to demonstrate the efficacy of the proposed DGDATA method.